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We study the achievable error exponents in joint source-channel coding by deriving an upper bound 
on the average error probabiUty using Gallager's techniques. The boimd is based on a construction for 
which source messages are assigned to disjoint subsets (referred to as classes), and codewords are 
^ independently generated according to a distribution that depends on the class of the source message. 

^ Particularizing the bound to discrete memoryless systems, we show that two optimally chosen classes 

and product distributions are necessary and sufficient to attain the sphere-packing exponent in those 
cases where it is tight. Finally, we prove that the very same results extend to lossy joint source-channel 
On coding for sources and distortion measures that make the source rehabiUty function convex. 
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I. Introduction 

In [1], Shannon proved the source-channel coding theorem for stationary memory less sources 
and channels. The direct part of the theorem states that a source of entropy H{V) can be 
transmitted over a channel of capacity C with vanishing error probability as the block length 
grows large if H{V) < C . Conversely, the error probability is bounded away from zero if 
H{y) > C. For the achievability part, Shannon used separate source-channel coding, indirectly 
showing that the concatenation of source and channel codes suffices to asymptotically achieve 
vanishing error probability. Yet, for a fixed block length, the error probability is smaller with 
jointly designed source-channel codes [2]. This reduction in error probability has been quantified 
in terms of error exponents, defined as the asymptotic exponential rate of decay of the error 
probability in the block length as the block length tends to infinity [2], [3]: the error exponent 
of joint design is at most twice that of the concatenation of source and channel codes [4]. These 
results are usually derived using random-coding arguments [1], [5], i.e., by considering codebooks 
whose codewords have been randomly generated, and by then analyzing the ensemble-average 
error probability. As there exists at least one code in the ensemble whose error probability is 
not larger than that of the ensemble-average, one concludes the existence of good codes with 
small error probability. 

A number of results in information theory [1], [6], including the derivation of bounds on the 
error exponents, can be derived by means of the method of types [7], [8]. Inter alia, Csiszar 
derived an achievable exponent for source-channel coding by drawing codewords at random 
from a set of sequences with (at most) a fixed polynomial number of types [2] and with a 
composition that depends on the source message. He also showed that the exponent coincides 
with an extension of the channel-coding sphere-packing exponent [9] in a certain rate region. 

In contrast, Gallager [3] derived an achievable exponent using random-coding methods, where 
codewords are drawn according to a product distribution independent of the source message and 
do not necessarily have a fixed type. This method, which naturally extends to channels with 
continuous alphabets and memory, yields a simple derivation of the channel-coding exponent in 
discrete memoryless channels [3, Th. 5.6.2]. However, the straightforward application to source- 
channel coding gives a (generally) weaker achievable exponent than Csiszar's method. Although 
this difference is typically small [4], the methods used to derive each exponent are conceptually 
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different, which raises the question of whether the difference lies in the the composition of 
codewords (fixed-composition vs. product-distribution), in the ensemble choice (varying code- 
word distribution for different source messages vs. identically distributed codewords), or in the 
bounding technique of the average error probability (method of types vs. Gallager's techniques). 
This can be summarized in a number of questions: 

1) Can the sphere-packing exponent be attained with random codes generated by product 
distributions? 

2) Do codeword distributions need to be source-message-dependent? 

3) Do Gallager's bounding techniques suffice to derive Csiszar's exponent? 

4) Does the formula for the best exponent hold beyond discrete memoryless systems? 

In this paper, we answer the first three questions in the positive by highlighting the im- 
portance of the ensemble choice. Specifically, we show that both product-distribution and fixed- 
composition codes can attain the sphere-packing exponent in the cases where it is tight. However, 
the codewords associated to different source messages may need to be generated according to 
different product distributions. To show this, we construct a generic ensemble that encompasses 
product-distribution and fixed-composition ensembles and apply Gallager's bounding techniques 
to derive a good upper bound on the ensemble-average error probability. We then find that the 
exponential rate of decay of this bound coincides with Csiszar's exponent when at most two 
different product distributions (and analogously two codeword types) are employed to generate 
the ensemble. Some of our results naturally extend to channels with continuous alphabets and 
source-channel pairs with memory, partly answering the fourth question. 

The paper is structured as follows. In Section II we review related previous work on source- 
channel coding. Section III, the main section of the paper, presents the new random coding 
bound and demonstrates that it recovers existing bounds on the error exponent. These results 
are then applied in Section IV to lossy source-channel coding. Finally, we conclude in Section 
V with some final remarks. Proofs of several results can be found in the appendices. 

A. Notation and definitions 

An encoder maps a source message v to a length-n codeword x(v), which is then transmitted 
over the channel and decoded as i) at the receiver upon observation of the output y, see Fig. 1 . 
The source is characterized by a distribution Pv{v) = 11^=1 -Pv(^j)' = {vi,...,Vk) G V^, 
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Figure 1. Block diagram of JSCC. 

where V is an alphabet of cardinality |V|. The channel law is given by a conditional probability 
distribution PY\x{y\x) = ni=i PY\x{y]\xj), cc = (xi, . . . , x„) e A"", y = (yi, . . . , y„) e 
where X and y denote the input and output alphabet, respectively. While X and y are assumed 
discrete for ease of exposition, a number of achievability results presented in this paper extend 
in a natural way to continuous alphabets. 

Based on the output y, the decoder guesses a source message v according to the maximum 
a posteriori (MAP) criterion, i.e., 

V = aj:gra&-KPv{v)PY\x{y\x{v)). (1) 

Where unambiguous, we simplify notation by writing x instead of x{v). Throughout the paper, 
we shall use the notation A ~ to indicate that A is distributed according to the distribution 

Pa. 

We study the average error probability e, defined as 

e 4 Pr{V ^ V}, (2) 

where capital letters are used here, and throughout the paper, to denote random variables. Specif- 
ically, we study the exponential decay of the average error probabiUty. Consider a sequence of 
sources with length k — 1,2, . . . and a corresponding sequence of codes of length n — ni,n2, ■ ■ ■ 
We shall assume that the ratio ^ converges and refer Xo t = \\m.k^oo ^ as the transmission rate. 
We say that an exponent > is achievable if there exists a sequence of sources and codes 
such that the error probability e satisfies 

e < e-"^+°("\ (3) 

where o{n) is a sequence such that lim„_^oo = 0. For fixed Py and Py\x, the reliability 

function is defined as the supremum of all achievable error exponents E. 
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II. Previous work 

We first summarize results in the literature on the reliability functions of source and channel 
coding that will be used in the rest of the paper. For source coding, the reliability function of a 
source Py at rate R, e{R,Pv), is given by [10], [11] 

e{R,Pv)= min D{Pq\\Pv) (4) 

Q:H{Q)>R 

= SUp{pi?-E,(p,Py)}, (5) 

with -D(-||-) denoting the divergence between two distributions, Q Pq being a dummy random 

variable, and where Es{p, Pv) denotes Gallager's source function, 

i+p 



E,ip,Pv)^\og(^Pviv)^^ . (6) 



Here, and throughout the paper, we avoid writing the sets explicitly in minimizations and 
summations performed over the entire set. 

In the interval H{V) < R<t\og |V|, (4) becomes [2, eq. (7)] 

e{R,Pv)= min D{Pq\\Pv). (7) 

Q:H(Q)=R 

For channel coding, the reliability function of a channel Py\x at rate R, E{R, Py\x), is 
bounded as 

Er{R, Py\x) < E{R, Py\x) < E,^{R, Py\x), (8) 
where Ej-^R, Py\x) is the random-coding exponent, given by [3] 

E,{R, Py\x) = max {^o(p, Py\x) - pR], (9) 
pe[o,i] 

and E^^{R,Py\x) is the sphere -packing exponent, given by [9] 

E,^{R, Py\x) = sup{Eo(p, Py\x) - pR). (10) 

p>0 

In (9) and (10), we define Eq{p,Py\x) = maxp^ Eo{p, Py\x, Px) with Eo{p, Py\x, Px) de- 
noting Gallager's channel function 

Eo{p,Py\x,Px) = -log 5^ |'5^Px(x)Py|x(2/|a^)^] . (11) 

y \ X / 

In [3, Prob. 5.16], Gallager used a random-coding argument to derive an upper bound on the 
average error probability of source-channel coding by drawing the codewords independently of 
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the source messages according to a given distribution Px- He showed that, for every Px and 
every p e [0, 1], there exists a code whose error probability satisfies 

g <- ^-{Eo{p,PY^x,Px)-E4p,P^^)) _ 

For a product distribution, Px{x) = YYi^i^xixi), (12) specializes to 

Thus, the error probability e vanishes exponentially in n with exponent Eq{p, Py\x, Px) — 
tEs{p,Pv)- By minimizing (13) over Px and p, Gallager obtained the achievable exponent 

^ max{Eo(p,Py|x) -t^s(p,Py)}. (14) 

PG[0,1] 

It can be shown that this exponent is positive whenever tHiV) < C . 

Csiszar refined this result using the method of types [2]. Csiszar's approach is different from 
Gallager's in several aspects. Firstly, Csiszar partitions the message set into source-type classes 
and considers fixed-composition codes that map messages within a source type onto sequences 
within a channel-input type. Secondly, a suboptimal maximum mutual information decoder is 
used at the receiver. This decoder first decides on the source type that is being transmitted and 
then on the source message within the type. Finally, Csiszar uses a channel-coding result for 
messages with unequal error protection [2, Th. 5] to prove that for every 5 > 0, there exists an 
no G N such that, for n > no, the probability of error e can be upper-bounded as 

g < ^^-ke{'^,Pv)-nE,(R.,PY\x)-2S ^ ^-^5^ 
1=1 

where A^^ denotes the number of source-type classes. This yields the achievable exponent 

min {te(§,Pv]+E,{R,Py\x)], (16) 

tH{V)<R<Rv [ \t J J 

where i?v — ilog |V|. A convenient alternative representation of Ef^ was obtained by Zhong et 
al. [4] via Fenchel's duality theorem [12, Thm. 31.1]: 

Ef' = max {Eo{p, Py\x) - tE,{p, Py)}, (17) 

P6[0,l] 

where Eq(p, Py\x) denotes the concave-hull of Eq(p, Py\x), defined as the pointwise infimum 
over the family of affine functions that upper-bound Eo(p, Py\x) as a function of p G [0, 1] [13, 
Cor. 12.1.1]. For completeness, we provide a direct derivation of this form of Pp in Appendix I. 
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It follows from (17) that Ef^ > Ef, with the inequality possibly being strict. By inspection of 
(5) and (9), one can check that Ef^ and Ef coincide when the optimizing p's of the source and 
channel-coding exponent in (5) and (9) coincide for the value of R optimizing (16). 

Note that in channel coding there is no gap between Ef^ and Ef. In this case the error 
exponent is given by 

Ef = max {Eo(p, Py\x) - pR} (18) 

pe[o,i] 

Since the concave-hull operator does not affect the maximum of a function, and since the 
concave hull of Eo{p, Py\x) — pR is Eo{p, Py\x) — pR, it follows that 

Ef = max {Eoip, Py\x) - pR} = Ef' (19) 

PG[0,1] 

To validate the optimality of Ef', Csiszar derived a sphere-packing upper bound on the 
exponent [2, Lemma 2], 

When the minimum on the right-hand side (RHS) of (20) is attained for a value of R such that 
E^p{R, Py\x) = Ei.(R, Py\x), the upper bound (20) coincides with the lower bound (16) and, 
hence, Csiszar's exponent equals the reliability function, i.e., Ef' = Ej. This is the case for 
values of R above the critical rate of the channel i?cr [2]. 

In the next section, we derive a random-coding bound using Gallager's bounding techniques 
that recovers Ef' with codebooks generated by product distributions, thus providing the reliability 
function in the cases where Csiszar's exponent is tight. 

III. Random-Coding Bound and Achievable Exponents 

Often, random-coding techniques are used to derive upper bounds on the error probability 
by considering an ensemble of codebooks in which codewords are generated with a certain 
probability distribution, and by analyzing the error probability averaged over all codes in the 
ensemble e = E[e]. This argument proves the existence of at least one codebook in the ensemble 
with error probability e < e. 

The ensemble-average error probability e can be bounded by extending the random-coding 
union (RCU) bound derived by Polyanskiy et al. [14, Th. 16] as well as a lower bound by 
Scarlett et al. [15, Th. 1] to joint source-channel coding. 
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Proposition 1 (RCU bound for source-channel coding): Consider the ensemble of codebooks 
where every message v E V'' is mapped onto a codeword X that is independently generated 
according to a given Px\v{X\V = v). Then, the ensemble-average error probability e assuming 
MAP decoding satisfies 



Here, the probability on the RHS of (22) is computed with respect to X ~ Px\viX\V = v) 
and the expectation is jointly taken over the variables (V, X, Y) ~ PvPx\vPy\x- 

Proof: The proof of the upper bound in (21) is an extension of the proof of the RCU bound 
for channel coding [14, Th. 16] when the codewords are independently generated according to 
a given Px\v and averaging over V. The proof of the lower bound in (21) follows the same 
steps as in [15] for the channel-coding case. ■ 

One can derive the error exponent of a certain codebook ensemble (random-coding error 
exponent) by studying the exponential behavior of (22). These exponents will help us determine 
the ensemble properties, such as codeword dependence, codeword distribution or codeword 
composition, required to attain the sphere-packing bound. 

We shall consider the following family of random-coding ensembles: 

1) Define a partition Vk of the source-message set V'' into disjoint, nonempty subsets 

2 = 1,..., Nk, such that |J^\ Wj^^^ = V''. We shall refer to these subsets as classes. 

2) A channel input distribution P^ is assigned to each class A^ ^ . For each source message 
V G A^^\ we randomly and independently generate codewords x{v) G according to 

^x • 

The above construction includes known ensembles such as i.i.d. or fixed-composition code- 
books as well as ensembles where the codewords are non-identically distributed, i.e., generated 
with more than one product distribution or drawn over more than one codeword-type class. 



-RCU < e < RCU 

4 ~ ~ 



(21) 



where 



RCU ^ Yl Pv{v)Px\v{x\v)PYix{y\x) 



X mm 
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Theorem 1: For every partition Vk and for every set of channel-input distributions P^ix) 
i = 1, . . . ,Nk, there exists a codebook satisfying 

e < ee {Vu) = h{k) V exp ( - max {Eq {p^, Py\x,P^x^) - E^'^ (p„ Pv)]), (23) 

1=1 ^ ' 

where hik) = and 

E^\p,Py)^ log 



(24) 



Proo/- See Section III-D. ■ 
Theorem 1 extends [3, Th. 5.6.2] to codebook ensembles where codewords are independently 
but not necessarily identically distributed. Furthermore, Theorem 1 holds for general (not neces- 
sarily memoryless) discrete channels and sources, and naturally extends to continuous channels 
by following the same arguments as those extending Gallager's exponent for channel coding. 

A. Particularization to specific random-coding ensembles 

Choosing in Theorem 1 the sequence of partitions Vk such that Nk = I and A^^'^ = for 
some {k,n), we obtain Ei^\p, Py) = E^{p,Pv). Thus, with a product distribution P^\x) = 
YYj=iPx\^j)' Theorem 1 recovers Gallager's bound (13). If we optimize the bound (23) for 
this choice of Vk over Px \ and let lim„_>.oo ^ = t, Theorem 1 recovers Gallager's bound 
on the error exponent (14). Furthermore, for -Pv(*^) = e^"^% v E V'^, we recover the channel- 
coding random-coding exponent E^{Rc, Py\x) (9), while for a given source-coding rate Rs and 
the identity channel we obtain a lower bound on the source-coding reliability function (5) [11]. 

We next show that E^ is tight with respect to the ensemble. 

Theorem 2: Consider the codebook ensemble in which the codewords are randomly chosen 
according to Px\x) = YYj=i Px\^j)' where P^^ maximizes Eq{p, Py\x, Px^)- The random- 
coding error exponent for this ensemble is given by E^, defined in (14). 

Proof: The achievability part follows from Theorem 1 . The converse follows from Appendix 
II using the optimal choice of Px^ . ■ 

With a more judicious choice of the random-coding ensemble, (23) recovers Csiszar's lower 
bound on the exponent (16). That is, by identifying the classes • . . , with the source- 

type classes 7i, . . . , Tat^ (where a source-type class Ti is defined as the set of all source messages 
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V eV'' with type Pj [7, Def. 2.1]), and by considering the set of product distributions 

n 

Pxi^) = llPxi^,)^ ajGA'", t = l,...,Nk. (25) 

j=i 

For this random-coding ensemble with P^^ maximizing -Eo(pj, Py\x, p'x)^ (^3) attains Csiszar's 
exponent. This exponent is tight with respect to the ensemble. 

Theorem 3: Consider the codebook ensemble in which the codewords assigned to a source 
message belonging to a source-type class %, i = l,...,Nk, are randomly chosen according 
to P^\x) = Ylj=i Px\xj)- For Px maximizing EQ{pi, Py\x, Px'')' the random-coding error 
exponent for this ensemble is Ef^, defined in (16). 

Proof: We refer to Appendix II for a proof of the converse and to Appendix III for the 
proof of the achievability part. ■ 
Hence, the ensemble-tight error exponent given by Theorem 3 gives the source-channel reliability 
function when the minimum in (16) is attained at a rate above the channel critical rate 

B. Attaining Csiszar's exponent with two classes 

Since the number of source-type classes grows polynomially in k [7], the number of classes 
used to attain Gallager's and Csiszar's exponent ranges from one in Theorem 2 to a polynomial 
function of k in Theorem 3. This raises the question of how many channel input distributions 
are needed to attain the best exponent. We next show that by optimally choosing the partition 
Vk, an ensemble with only two classes, with their associated input product distributions, suffices 
to obtain random-coding bounds that attain Ef^. 

Let T{v),v E V^, denote the set of all source messages having the same source-type class 
as V. Then, we define the partition VkiRo) as follows. For some Rq > 0, we assign the source 
messages into the two classes 

A^\Ro)^{v: |r(t;)|>e"^<'}, (26) 
A'i\Ro)^{v: |r(i;)|<e"^»}, (27) 

where |T(i')| denotes the cardinality of T{v). Note that ^[,^^(i?o) is non-empty as long as 
gnfio Is smaller than or equal to the largest cardinality of a source-type class, while A^^\Ro) is 
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nonempty as long as e" ° > 1. We define 



EBiRo) — liminf 



1 



logeB(A.(i?o)) 



(28) 



n 



Eb{Ro) — limsup 



1 



logeB(n(i?o)), 



(29) 



n 



where eB(-) is defined in (23). 

Theorem 4: Consider the family of partitions {Pfc(-Ro), -Ro ^ [0, Rv\] for every k>l. There 
exists Rq G [0, i?v] and a set of product distributions P^^{x), i = 1,2, such that 



Eb{Ro) = Eb{Ro) = Ej \ 



(30) 



Proof: The lower bound Eb{Ro) > Ef^ is shown in Appendix IV, and the upper bound 



While the proof of Theorem 4 assumes random coding over product distributions, the result 
can also be extended to fixed-composition ensembles. It follows that a (conditionally) fixed- 
composition code also achieves Ef^ by using only two optimally chosen codeword types. 

Appendix II shows that ensembles drawn from more than two product distributions do not 
increase the exponent beyond Ef^ even in those cases where Ef^ does not coincide with 
the sphere-packing exponent. However, there may be other ensembles that could improve the 
error exponent by introducing dependence over codeword pairs. For example, the expurgation 
technique [3] eliminates bad pairs of codewords from the code, introducing dependence among 
codewords. Nevertheless, this problem lies beyond the scope of this paper. 
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C. Example: a 6-input 4-output channel 

Consider the source-channel pair' composed by a binary memoryless source (BMS) and a 
non- symmetric memoryless channel with \X\ = 6, =4 and transition-probability matrix 



Y\X 



\ 



1-36 

ei 

6 



6 

1-36 

I - 6 



6 

1-36 

1-6 / 



(31) 



1-36 

I - 6 
6 

This channel is similar to the channel given in [3, Fig. 5.6.5] and studied in [4] for source-channel 
coding. It is composed of two quaternary-output sub-channels: one of them is a quaternary-input 
symmetric channel with parameter 6^ and the other one is a binary-input channel with parameter 
6- We set 6 = 0.065, 6 = 0.01, t = 2 and Py(l) = 0.028. Therefore, the source entropy is 
H(y) = 0.1843 bits/source symbol, the channel capacity is C = 0.9791 bits/channel use and 
the critical rate is _Rcr = 0.4564 bits/(channel use). Let R* denote the value of R minimizing 
(16). In our example we have R* = 0.6827 > R^ and Ef^ is tight. 

As Gallager observed, optimizing the function Eq over the input distributions may lead to 
a discontinuity of the derivative of the resulting function with respect to p: in our example, 
the optimal distribution Px at p = 0.31 changes from (| | | | O) to (O ^ ^). 
This implies that Eo(p, Py\x) is not concave in p G [0, 1]. It follows that, for the p maximizing 
Eo{p, PY\x)-tE,{p, Py), we have that Eo(p, Py\x) > Eo{p, Py\x) and, consequently, Ef"" > Ef. 

In Fig. 2 we plot the arguments in (14) (Gallager's exponent Ef) and (17) (Csiszar's exponent 
-Ep) as functions of p, respectively. For the two-class partition (26)-(27), the figure also shows the 
bracketed terms in (23) as a function of pi. In our example the threshold Rq = R* = 0.6827 bits 
gives rise to a partition achieving Ef\ The overall error exponent of the two-class construction 
is obtained by first individually maximizing the exponent of each of the curves over pi, and by 
then choosing the minimum of the two individual maxima. For reference purposes, we also show 
the values of Ef and E'p with horizontal solid lines. The figure shows how the non-concavity of 



'in this section we will assume that the logarithms and exponentials are computed to base 2. Hence all the information 
quantities related to this example are expressed in bits. 
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Figure 2. Error exponent bounds. Csiszar's and Gallager's curves correspond to Eo{p, Py\x) —iE^.^p, Py) and Eo{p, Py\x) — 
tEs(p,Pv), respectively. Single-class curves correspond to Eo(p, Pyix) — lim -Es^\p, Pv), for i — 1,2, when Rq — Rq. 

n— foe ^ 

Gallager's function around the optimal p of Csiszar's function translates into a loss in exponent. 
For the two-class construction with Rq = 0.6827, the exponent of both classes coincides with 
Ef^. The overall exponent is thus given by Ef^, which is in agreement with Theorem 4. 

Note that the two-class partition characterized in (26) and (27) does not achieve Ej"^ for every 
value of Rq. Fig. 3 shows the error exponents corresponding to each class for a suboptimal 
Rq = 0.72 bits. The overall error exponent corresponds to the smallest of the two individual 
maxima, shown in the figure with a circle. When the partition is not optimally chosen, the 
exponent of one class increases at the cost of lowering the other, hence resulting in a worse 
overall error exponent. 

We next show how the upper bounds on the error probability behave as functions of the block 
length n. Fig. 4 plots cb (23) optimized over distributions and two-class partitions, along with 
Gallager's upper bound (13), the RCU bounds for an i.i.d. ensemble (either with input distribution 
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Figure 3. Error exponent bounds. Curves correspond to Eo{p, Pyix, Py'^) ~ lim - E!;^' {p, Pv), for i — 1,2, when 

n— ^oo ^ 

Ro = 0.72 is suboptimally chosen and P^'' take the values P^^ = (| | | i O), P^^' = (O § |). 



1 P'(^)^ 



equal to P^^^ = (| | | ^ O) or P^'^ = (O | ^)) and the RCU bound for a construction 
employing two classes, similar to the one leading to Theorem 4. While Gallager's upper bound 
is tighter than cb for the block lengths considered, cb has a steeper slope. Furthermore, the 
single-class RCU bounds attain the same asymptotic slope as Gallager's bound. The figure 
further shows that the RCU bound with two classes decays faster than the RCU bounds assuming 
a single distribution. 
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Figure 4. Random coding upper bounds on the error probability e. 



D. Proof of Theorem 1 

To prove Theorem 1, we apply the RCU bound (22) together with the proposed code con- 
struction to obtain 



x,y 



} (32) 



>Pviv)PY\xiy\'^) 



^E E Pviv)J2P^x\x)PY\x{y\x) 



x,y 



X mm 



a; 



Pv(v)PF|x(?/|a3) 



(33) 
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where (33) follows from Markov's inequality for sj > 0, j = 1, . . . , [3]. Using the inequality 
min{l, A + B} < min{l, A} + min{l, B}, A, B > 0, we upper-bound (33) as 



where e{i,j) is given by 



(34) 



x,y 



(j) s 



Pv{v)PY\x{y\x) 
Pv{v)PY\xiy\x) 



(35) 



Using the inequality min{l, A} < A'', for A > and < p < 1 [3], it follows that for pij G [0, 1] 
and Sj > 0, i, j = 1, . . . , N^, the term is upper bounded by 

e{^,J)< J2 Pv{v)J2Pxi^)PY\x{y\x) 



x,y 



( 



X 



(i), 



x\ 



€A 



(j) a: 



Pviv)PY\xiy\x) 
Pv{v)PY\x{y\x) 



(36) 



5] J2 Pv{vy-^^^'^J2Pxi^)PY\x{y\xy- 



PijSj 



(0 



X 



5^ Pv(^^)^'^ $^P^'^(^)Pr|x(?/|*) 



\veA): 



(37) 



Choosing pij = G [0, 1] for Sj, G l], and substituting (37) into (34) yields 



JVfc 



(38) 



where 



Gi{y) 



(i) 



(39) 
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From (38) we have that 

e<Y.Y.^^^yy'^i^y)'~'' (^o) 

i,j=l y 

E^^(^) E^^(^) (41) 

«j=i \ y J \ V J 

^ E (^^^^^^) ^ " (?^'^^^)) ^"^^^ 

Nk Nk / \ 

i=l y i,j=l \ y y / 

i=l y 

where in (41) we applied Holder's inequality < ||/||p||5'||g with p = j: and q = jz^; (42) 
follows from the inequality between arithmetic and geometric means; and in (43) we used the 
bounds I < < 1 in the terms of the sum for which i ^ j. By identifying 



y 



and optimizing over ^ < Si < 1, i = 1, . . . , Nk, it follows that 
which concludes the proof. 



Vexp - ma.x {Eo{p,,Py\x,P^^) - Ei'\p„Pv)}], (46) 
1=1 ^ ' 



IV. Lossy joint source-channel coding 

Theorem 1 can be generalized to recover Csiszar's lower bound on the error exponent for 
lossy source-channel coding [16]. Suppose that the source messages v generated by a DMS are 
decoded at the receiver as 2 G 2*^, where Z is a reproduction alphabet, assumed discrete for 
ease of exposition. We allow a maximum tolerable distortion A > between v and 2, given by 
the multi-letter function [6], [17] 

d:V^ ^Z^ M+ (47) 

k 

iv,z) ^ rf(«,2)^^^d(y,-,z,), (48) 
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where d : V x Z Mq is some symbol distortion function. Hence, the decoder causes an error 
if the output z is such that d{v,z) > A. The probability of excess distortion is then given by 

eA ^PrKV,Z) > A}. (49) 

Observe that setting the reproduction alphabet Z = V and the distortion measure d(v,z) = 
if V = z and d(v,z) = oo otherwise, any finite A recovers the (almost lossless) source-channel 
coding scheme introduced in Section I-A with = e. 

Csiszar proved in [16] the existence of a codebook whose excess-distortion probability satisfies 
for every 5 > and sufficiently large n 

CA < ^e~'^^(^'^'^^)-"^'(^^'^^i^)-2'', (50) 



i=l 

k 



where A^^ denotes the number of source-type classes. When lim„_>.oo ^ = t, (50) gives rise to 
the achievable exponent 

^ mf {R, Py\x) A, Pv^ |. (51) 

The function F (y, AjP^/) is the source reliability function for maximum distortion A and 
rate R, given by [18] (see also [7]) 



where 



R(Pq,A)^ min I{Q;Z) (53) 

Z:E[d(0,Z)]<A 



is the rate-distortion function of a DMS with distribution Pq [19]. When the infimum of the RHS 
of (51) is attained at a rate R > Rcr, the exponent coincides with the sphere-packing exponent 
[16, Th. 4], given by 

^ m!^E,, {R, Py\x) +t^{j^ A, Pv^ }• (54) 

The proof of the achievable exponent (51) relies on the following lemma [6] [7, Lemma 9.1]: 
Lemma 1 (Type-covering lemma): For every source-type class % C V^, i = 1, . . . Nk, (asso- 
ciated with type Pj), symbol distortion function d(-, ■), and numbers A > 0,5 > 0, there exists 
a ^0(^5 d, A) G N and a set 77 C Z^ such that for every v E % and k > ko{S, d, A), 

d{v , Ti) = min d{v , z) < A (55) 
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and 

\%\ < e^('^(P-^)+^). (56) 

We will refer to the sets 71, i = 1, . . . ,Nk, in Lemma 1 satisfying (55) and (56) as type- 
covering sets. 

To upper-bound the minimum excess-distortion probability, we consider the following trans- 
mission scheme. For a given (k, n) and an ensemble of type-covering sets, a source encoder 
/a : V'^ — 7- Z'^ first maps each message v E % onto a sequence z E % for which d(v, z) < A. 
The sequence z is then transmitted over the channel using an source-channel code. This encoder 
induces a message set /a(V''') C with (multi-letter) probability distribution 

P.WJ^— ".^'^-C)- ^^^^<^'' (57, 
I 0, otherwise. 

We next apply the random-coding arguments in Section III to show the existence of a good 
code. Let z = f/\{v) be the output of the source encoder and let z be the output of the MAP 
decoder, i.e., 

z{y) = aigmax Pz{z)PY\x(y\x{z)). (58) 

z 

It follows that eA can be upper-bounded as 

= Pr{d(V, Z) > A} < Pr{Z Z} (59) 

since, by construction of the source encoder /a(-)' ^e have that /a(i')) < A, v E V'^. 
Thus, ca can be upper-bounded by the average error probability for the source-channel coding 
problem with source distribution Pz- Consequently, Theorem 1 gives an upper bound on (59) 
which can be used to derive an achievable excess-distortion exponent. We show in the next 
theorem that this exponent is not smaller than (51). 

Theorem 5: Consider sequences of partitions of /a(V''') such that the values of k and n satisfy 
the condition that ^ < t for all {k,n) and lim„_5.oo ^ = i- Then, for every A > and sufficiently 
large k, there exists a sequence of partitions with A*"^ classes, Vk = {Ai, . . . , An^,} and N^. < 



such that the ensemble of codebooks generated with product distributions P^ = YYi=i Px ' 



.N^., over the message set /aIV^) achives (when maximized over P^) 



lim inf - log -ca > E^%. (60) 

n—^oo n ' 
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Proof: See Appendix V. ■ 
Theorem 5 may be extended to more general sources and distortion measures that permit a 
type-covering lemma, such as memoryless Gaussian sources with mean-squared error distortion 
[20]. Furthermore, since Theorem 1 directly extends to continuous channel input and output 
alphabets, it may also be extended to more general channels. 

Based on the results from Section III-B, one may wonder whether it suffices to partition 
/a(V^) into two classes in order to attain Ef\. However, the source reliability function F(-) 
is not necessarily convex (with respect to R) for arbitrary distortion measures, so the proof 
of Theorem 4 does not necessarily generalize to lossy joint source-channel coding. For those 
distortion measures for which the function F{R, A, Py) is convex with respect to R (e.g, 
Hamming distortion [4]) the very same arguments of Section III-B can be applied to prove 
that random codes generated with at most two distributions attain the sphere-packing exponent. 

V. Conclusions 

We studied the error probability of random-coding ensembles where different codeword dis- 
tributions are assigned to different subsets of source messages. We showed that at most two 
appropriately chosen subsets (with their corresponding optimized product distributions or code- 
word types) suffice to attain the sphere-packing exponent above the critical rate of the channel. 
Our analysis shows that the best known random-coding exponent due to Csiszar can be derived 
by using Gallager's bounding techniques. This permits the generalization of some of the results 
in this paper beyond discrete memoryless channels. 

We further showed that lossy source-channel coding exponent can be attained by using a 
refinement of our code construction that does not necessarily resort to fixed-composition codes. 
If the distortion is such that the source reliability function is convex with respect to the rate, 
then random codes generated with two product distributions are sufficient to attain the lossy 
source-channel exponent. 
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Appendix I 
Derivation of the concave hull form of 

The parametric form (17) of Csiszar's exponent can be obtained from (16) as 



Ep = min <^ E,{R, Py\x) +te\-,Pv]\ (61) 

tH{V)<R<Rv 1 V ^ 



= min <^ max {Eq{p\ Pyix) - p'R} + max {pR - tEs{p, Py)} } (62) 
R>o |^p'e[o,i] I / ' J p>o ' ''J 

= max |min | max {Eo{p,Py\x) + R{p - P )}| - tEs{p,Pv)^ (63) 

= max jmrn | max {Eo{p', Py\x) + R{p " p')}} - ^^s(p, , (64) 

where in (62) we used the definitions of source and channel reliability functions and we relaxed 
the minimization interval by noting that E^{-) is decreasing in R together with the fact that 
e{R/t,Pv) = for i? < tH{V) and e{R/t,Pv) = oo for i? > Rv; in (63) we applied Sion's 
minmax theorem [21] which is valid since the inner function is concave in p for fixed R and 
convex in _R > for fixed p; and the last step (64) follows from the fact that the function that 
is maximized over p > in (63) is decreasing for p > 1. 

Finally, in order to derive (17) from (64), we make use of Lemma 2 below, which is a 
consequence of the fact that the double conjugate of a function is equal to its convex hull [12, 
Thm. 12.2]. 

Lemma 2: For p G [0, 1] it holds that 

Eo{p, Py\x) = min max {Eo{p' , Py\x) + Rip - P )}• (65) 

R>o p'e[o,i] ^ ^ 

Proof: The conjugate function of g{p) = —Eo{p, Py\x), for p G [0, 1], is given by [12] 

9*W = ^f^^^i"^ ■ P ~ dip)} (66) 
= max {A ■ p + Eoip', Py\x)}- (67) 

One can check that (67) is bounded for all A G M since Eq{p, Py\x) is continuous in p G [0, 1]. 
Using that the double conjugate of a function is equal to the convex hull of the original 
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function [12, Thm. 12.2] we have that the concave hull of Eq{p, Py\x), P £ [0, 1], is given by 

Eo{p,PY\x) = -g**{p) (68) 

= - max<^ A ■ p - max {A ■ p' + Eo(p', Py\x)] \ (69) 
AeM I p'e[o,i] ' J 



We next show that we can replace the minimization over A G M in (70) by a minimization over 
A < 0. Indeed, suppose that the objective in (70) is minimized for A > 0. Since Eq{p\Py\x) 
is nondecreasing in p', it follows for all p G [0, 1] that Eq{p' ,Py\x) — ^{p — p') increases in 
p' G [0, 1]. Hence, for A > the value of p' maximizing (70) is p' = 1 and — A(p — p') > 
for every p G [0,1]. Consequently, choosing A = does not decrease the objective (70), so 
restricting the minimization over A G M to a minimization over A < does not change (70). 
Lemma 2 follows then by replacing A by — _R. ■ 

Appendix II 

Upper Bounds on Ensemble-Tight Exponents 

In this section we derive upper bounds on the random-coding error exponent for the codebook 
ensembles described in Section III-A. To this end, we consider codebook ensembles for which, 
for fixed {k, n), the codewords are generated according to a product distribution P^^ that depends 
on the type class 71, i = 1, . . . , A^a: of the source message. For messages v in the z-th type class, 
codewords are drawn according to the distribution 

n 

1=1 

for some Qf^ G Q, i = 1, . . . , A^^, where Q is a (non-empty) set of probability distributions 
on X, i.e., Q C V{X), where V^A) denotes the set of distributions defined on A. This setup 
includes the i.i.d. ensemble (with \Q\ = 1,) the source-type class ensemble (with Q = 'D(X)) 
described in Section III-A, and the two-class ensemble (with |Q| = 2) from Section III-B. 

The next theorem bounds the random-coding exponent in terms of Eq(p, Py\x, Q), defined 
as the concave hull of the function Eo(p, Py\x, Q) — maxp^gQ i?o(p, -Pyix, -Px) in the interval 
p G [0, 1]. 
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Theorem 6: For codewords drawn according to (71), the random-coding exponent satisfies 

limsup-i^ < max{Eo{p,PYix,Q)-tE,{p,Pv)}. (72) 
n^oo n pe[o,i] 

Proof: See Appendix II-A. ■ 
When Q contains only one distribution Px, the concavity of Eo(p, Py\x, Px) as a function 
of p allows us to simplify (72) to 

limsup-i^ < max{Eo{p,PY\x,Px) -tE^{p,Pv)}. (73) 

n-^oo n P6[0,l] 

Choosing Px to be the distribution maximizing the RHS of (73), this upper bound matches 
the lower bound Ef in (14). In other words, if the codebook is drawn according to only one 
distribution Px, then Gallager's exponent is tight. 
By letting Q = V{X), Theorem 6 gives 

limsup-i^ < max{Eo{p,PYix)-tE,{p,Pv)}. (74) 

n^oo n pe[o,i] 

Since the RHS of (74) coincides with Ef^ in (17), we conclude that these ensembles have an 
error exponent that cannot exceed Csiszar's random-coding exponent. 

A. Proof of Theorem 6 

Before proving Theorem 6, we give some definitions. The set Ck^niPxr) given by 

A,n(Pxy) = {PXY e Vr,{X X 3^) : Py = , E [log Py |x (F |X)] > E [log Py |x (F |X)] } , 

(75) 

where (X, F) ~ Pxy and (X, Y) ~ Pj^y, and Py denotes the marginal distribution of P^y [15], 
and where Vn{A) denotes the set of types in Analogously, we define the set £(P^y) as 

£(Pj,y) ^ [PxY e V{X xy):PY = P^,E[logPy|^(F|X)] > E[logPy|x(r|X)]}, (76) 

with (X, F) ~ PxY and (X, F) ~ P'xy- We denote by T(Pxy) the type-class of sequences 
{x,y) with joint type Pxv- 
According to (21) and (22), we have 

1 

e> 



I f 1 

-^^Py(i;)E mmh,J2P^{Pv{v)PY\x{Y\X)>Pv{v)PY\x{Y\X)\XY^\ , 

(77) 
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where we have further lower-bounded e by only summing over those v that are in %. As in 
[15], we rewrite this bound in terms of summations over types with 

4 



e > T 52 E ^ Pr{ (X„ Yi) e riPxY)}i^{t, Pxy), (78) 



1=1 Pxy 

where 



Pxy) = min J 1, ^ |7l| Pr{(X„ y) G T{Pxy) lyePy}} 



(79) 



The underlying probability distributions in (78) are given by V ~ Py and (Xj, Yi) ~ PxPy\x', 
and the underlying probability distribution in (79) is Xi ~ . 

Applying the type inequalities [7, Lemma 2.3] and [7, Lemma 2.6] in (78) and (79), we obtain 

Nk 

e > E E ^w{-kD{P,\\Py) - nD{PxY\\Qt^ X P^,^) + - log 4) 

i=l Pxy 

X min I 1, 5^ exp(A;/7(V,) - nD{PxY\\Q^^ x Py) + 6^ I , (80) 

t PxYeCk,„{PxY) J 

where ~ Pi and 5^ „ = log(/i;+l)^'^' (n+l)^!'^!!-^! . The error probability can be further bounded 
by keeping only the leading exponential term in each summation in (80). Taking logarithms on 
both sides of (80), multiplying the result by — ^, and using the notation = max(a;,0) we 
obtain 



log e . . . j k 

< mm mm mm < — ^ ^. ^ ^ , _ ^. ^ _ ^ 



< min min min { -D{Pi\\Pv) + DiPxYllQ^^ x Py\x] 

n i=i-,-,NkPxY PxYeCk,niPxY)[n 



+ 



DiPxYllQ'i^ y<PY)--H{\^) 

Th 



n 



(81) 



where we define 5k,n — '^^'kn + log 4. Here we used that [nx]~^ = n[x]'^, for n > 0, that 
[x]^ = max(0,a;) is monotonically non-decreasing, and that [x + a]+ < [a;]+ + a, a > 0. 

Any distribution in 'D(A) over a set A can be written as the limit of a sequence of types in 
n, where each type belongs to Vn{A) [8, Sec. IV]. Hence, the uniform continuity of D{P\\Q) 
over the pair (P, Q) [2] ensures that for every P^y, and every ,^1 > 0, there exists a sufficiently 
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large n such that 

< min min _ min \ -D(P,||Py) + D{P':^Y\\Qk^ x Py\x, 



n i=i,-,Nk P'^Y Pxy&c{p'^y) I n 

+ 



D{PxY\\Qf ^PY)-^H{y,) 



H ^1, 

n 



(82) 

where we have replaced Ck^ni^'xy) by ^(Pxy)^ ^^^^ that [x + a]+ < [x]^ + a, a > 0. 
It follows from [15, Thm. 4] that 

min mill {D{P'j,y\\Px x P^x) + [D{Pxy\\Px x Py) - P] + } 

= max {Eo{p, Py\x, Px) - pR}, (83) 

PG[0,1] 

SO (82) is equivalent to 

min nD{P,\\Pv)+max\Eo{p,PYix,Q^^)-p-H{vA\-^-^+^^. (84) 
Maximizing (84) over Px E Q for each i = 1, . . . , Nk yields 

< . min^ I -/^(P.llPi^) + max (i?o(p,Py|x, Q) - p-^(^.) j 1 " — + ^i- (85) 



n i=i,...,Nk n pe[o,i] { n ) \ ^ 

Using the uniform continuity of [7] 

max{Eo{p,PY\x,Q)-pR} (86) 

P6[0,l] 

as a function of R, and that any distribution in ViV) can be written as the limit of a sequence 
of source types in k, it follows that for every .^2 > there exists a sufficiently large n such that 

<niinJ -D{Pi,\\Pv) + max (Eo(p,Py|x, Q) - p-H{V')] 1-^ + 6 + 6 (87) 
n Py [n pe[o,i] |^ n J J ^ 

By taking the limit superior in n, subject to the restriction that lim„^oo „ ~ ^' ^^^^ becomes 

limsup-i^ <min|tD(P{.||Py)+ max {Eoip, Py\x, Q) - ptH{V')}] + + ^2 (88) 

n^oo n P{, I pe[o,i] ' ) 

te(^,Pv] + max {Eo{p, Py\x, Q) - pR}] + ^1 + ^2 (89) 
\t ) pe[o,i] -'J 



mm 

0<i?<tlog|V| 



= max {Eo(p, Py|x, Q) - t^s(p, Pv)} + 6 + 6, (90) 

P6[0,l] 

where V ~ Py in (88); (89) follows from (7) with R = tH{V'); and (90) can be proved using 
the very same steps of Appendix I. The result follows by letting ^1 and ^2 tend to zero from 
above. This proves Theorem 6. 
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Appendix III 

ACHIEVABILITY PROOF OF CSISZAR'S EXPONENT 

In the following we show that Csiszar's exponent can be recovered from the upper bound 
(23) by considering product distributions (25) and by identifying the classes with source-type 
classes. The following result will be useful in the derivation of Csiszar's exponent with a product- 
distribution ensemble. 

Lemma 3: Let A^^\ . . . , A^^'"^ of the partition Vk be the source-type classes 7i, . . . , Tn^ - Let 
Ri = ^H{Vi), and let be a random variable whose distribution is the type Pj of the class 
Then, 

nRi 



Ei'\p,Pv)<npR,-ke{—^,P, 



k 



(91) 



Proof: Es^\p, Pv), i = 1, . . . , Nk, can be written as 



E^\p,Pv)= log 



( 



1+p 



1+p 



( 



+ log 



(0 



p log 1 711 +log 



(92) 



(93) 



where the last step follows since Pv(-) is constant within each source-type class J^^ = %. The 
claim follows then by the following inequalities [7, Lemmas 2.3 and 2.6]: 

log \Ti 



n 



( 



log 



E ^^(^) 



(0 



< Ri 



< -kD{?,\\Py) 



< -k min D(PA\Pv) 

j=l,...,Nk: 
H{Vj)>H{V,) 

<-k min D{Pq\\Pv) 

Q:H{Q)>H{V,) 



—ke 



nRi 



(94) 

(95) 

(96) 
(97) 
(98) 



where in (98) we have used the definitions of Ri and of the source reliability function (4). 
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Particularizing (23) to product distributions, and optimizating over , we obtain 



^ -n[ max |ii;o(ft,Py|x)-^Ss''*(ft,Pv)} , 



i=X 



^ ms^K {Eo{p^,PY\x)'P^R^} + ^e('^,Py 



<h{k)^e y^^'"''^ ) (100) 

i=\ 

= h{k) e-4^^(^''^-i-)+^(^'^-)) (101) 

i=l 

< Nkh{k)e ' " ^ '= • (102) 

, , -n mini EAR,PY.x) + ^^e(^iJ^,Pv)\ 

< Nkh{k)e «>o^ ' " U ' ^Ji^ (103) 

where in (100) we used Lemma 3; in (101) we have used the definition of the random-coding 
channel exponent (9); and (103) follows from relaxing the set {Ri}, i = 1, . . . , Nk, over which 
the minimization is performed, to i? > 0. 

Using the type counting lemma [7, Lemma 2.2], we have that < {k + 1)I^L Hence, for 
lim„^oo ^ = ^iid since E^{R^ Py\x) + Py) can be shown to have uniform convergence, 

it follows from (103) that 

liminf--logeB(Pfc) > E, {R, Py\x) + te (^,Pv\\. (104) 

n-s-oo n R>0 [ ^ ' ' \t J ) 

Since (^R, Py\x) + (f '-^v) ^ decreasing function of i? for < i? < H{V), and since 
e (y, Pv) = oo for i? > t log |V|, it follows that we can restrict the minimization over R to the 
inverval [tH(y), R\;], so the RHS of (104) is equal to Ef^. This proves the achievability part of 
Theorem 3. 

Appendix IV 
Proof of Theorem 4 

To prove Theorem 4, we first introduce a series of definitions and preliminary results. 

A. Definitions and preliminary results 

Consider the partition Vk{Ro) defined by (26) and (27). By noting that E'i\p) = E^'\p, Py) 
is a differentiable function of p we define 

A.(p,n)^-^4r^' ^ = 1'2- (105) 
n op 
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(106) 



We further define the function 

Ti{R,p)^ max {Eo{pi, Py\x) + R{p - Pi)} , i? > 0, p G [0, 1], 

for i = 1, 2, where the intervals 7^i(p) and 7^2 (p) are the respective 

7^l(p)^[0,p] and 7^2(p)^[p,l]. (107) 

Lemma 4: Consider the partition Vk{Ro)- For every p G [0,1], the hmit lim„_>oo ^E^^\p) 
exists and equals 

lim -E«(p) = -t min {D{Pe\\Pv) - pH{PM . (108) 

Proof: We first note that every source-type class Te is either a subset of Wj!'^ or of ^ 
For the partition Vk = Vk{Ro), it then follows from [7, Lemmas 2.3 and 2.6] that 



(2) 
•A: • 



lim —EI^\p) = lim — log 

n^CxD fl ' n-inr, n 



1 



/ 



= (1 + p) lim — log 

n— >c« 77, 



/ 



= (1 + p) lim — log p{k) 



g(P^||Py) + H(P^) 

e i+p 



(109) 



(110) 



(111) 



n IS 



where p{k) satisfies {k + l)"'"^' < p{k) < 1. Since the exponential decay of the sum in 
dominated by the exponential decay of the largest summand, we obtain 

lim-E«(p) = -t min {D(P^||Py) - pi7(P^)|. (112) 



Lemma 5: Consider the sequence of partitions {Vk{Ro)}k>o defined for < -Rq ^ Rv- Then, 
we have 



limsup A2(l, ri) < Rq < liminf Ai(0, ra) 



(113) 
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Proof: By taking the derivatives of Es^\p,Pv) in (24) with respect to p, it follows that 
Ai(p, n) can be written as 



Ai(p,n) = } ^log- 



(114) 



Noting that Pvl''^) = is constant over a type class v e Te, and using the definition of Vk{Ro), 
we have that in the limit as n — )■ oo the term Ai(0, n) can be lower bounded as 



lim inf Ai(0, n) 









%' Pf 





7^|P^ 






7^' 





(1) Pe 



(115) 

(116) 
(117) 



where in (116) we used the definition of the partition VkiRo), and (117) follows because the 
second term in (116) is non-negative. 

We next show that limsup^,^^^ A2(l, n) < Rq. First note that, by the convexity of Es^\p, Py), 
we have that A2(po5^) < liiiip-s>oo A2(p, n) for any po > 0. Then, 



limsup A2(l, ri) < lim sup lim X2{p,n) 

n— >oo n— >oo p— >oo 



lim sup lim — 



\rAP 



1 

i+p 



(2) (TAP II 



1+p 



El-T- I P 1+P 



1 1+p 



lim sup 



1 Y^i-T.r /l(2) I'T^I 



log 



E 1^' 



< lim sup — log 

n— >oo 



^nRo 



E 1 



<Ro+ lim -log(A; + 1)1^1 

n— 5>oo 



(118) 
(119) 

(120) 

(121) 

(122) 
(123) 



where (120) follows from taking the limit as p — )• 00; in (121) we used the definition of the 
partition VkiRo)', in (122) we used that the number of source type classes is upper-bounded 
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by (k + 1)1^1; and (123) follows from the fact that {k + l)'^' is polynomial in k and since 
lim„_j.oo ^ = ^- From the inequalities (115)-(117) and (118)-(123) the result follows. ■ 
Lemma 6: Let p be the smallest value satisfying 

= arg max {^o(p', Py\x) - p'Rv] ■ (124) 

p'e[o,i] 

Then, for p G 7^2 (p), we have 

max min|Tj(i?o, p)) = min max|Tj(i?o, p)l- (125) 

Proof: First we note that p is guaranteed to exist since Eo{p, Py\x) is continuous with 
respect to p, and since the maximization in (124) is performed over a compact set. It can be 
checked that 

Ti(0,p)-r2(0,p)<0, VpG[0,l], (126) 

Ti(i?v,p) - T2{Rv,p) > 0, Vp G [p, 1], (127) 

where (126) follows since Eq{p,Py\x) is non decreasing in p, and (127) follows since for 
p G [p, 1] the derivative of Eo{p, Py\x) with respect to p is upper bounded by R^. 

Note that Ti(R, p) is a continuous, non-decreasing function of R, while T2{R, p) is a con- 
tinuous, non-increasing function of R. It thus follows from (126) and (127) that Ti{R, p) and 
T2{R, p) cross at R*{p) satisfying 

«,Xhvi {sisl^-l^-")}} = {gE{r<(JS*(rt,rt}} (128) 

= |max{T.(fl*(rt,rt}| (129) 

= min < max^TARo, p)\ > . (130) 
i?oe[o,Rv] t»=i:2 ' J 



This proves Lemma 6. 



5. Proof of Theorem 4 

Armed with the above three lemmas and Lemma 2 from Appendix I, we proceed to prove the 
lower bound (30). Using the first-order Taylor-series expansion of Es^\pi) at a given pj G [0, 1], 
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we can bound E^^\p), for i = 1,2 and p G [0, 1], as 

Ei'\p) > E«(Pi)+^Ai(pi,n)(p-pi) (131) 

> E«(pi)+nAi(0,n)(p-pi), pi<p, (132) 
Ei'\p) > Ei^\p,)+nX2{p2,n){p-p,) (133) 

> Ep)(p2)+nA2(l,n)(p-p2), p2>p, (134) 

where we have used that the functions Ei^\p), i = 1,2 are convex and non-decreasing with 
respect to p, so 

< Ai(0, n) < A,(pi, n) < Ai(p2, n) < Ai(l, n), < pi < ps < 1. (135) 
Using Lemmas 4 and 5, together with (132) and (134), we have that 

Um-E«(p)> Um -E»(p,) + i?o(p-Pi), (136) 

n— >cxD n n— ^oo fl 

for Pi G Tii{p), i = 1,2. 

Applying the two-class partition defined by (26) and (27) to Theorem 1 yields that, for an 
arbitrary dummy variable po ^ [0, 1], E^(Rq) from (28) can be rewritten as 

1 / «^ — — max \ nEo(pi,Pv\x) — Ei^\pi)t 

E^{Ro) =liminf--logU(A;)ye "^^^^ o^"- ^i^^ ^^^'S 

n^oo n \ 

\ i=l,2 



= liminf min | max { Eo{pi, Py\x) - -E^'^pA] (138) 

n-s>oo i=l,2 [p,e[0,l] [ n J J 

> min I max |eo(p„ Py|x) - Hm -E«(P*) U (139) 

1=1,2 {p,e[0,l] { n^oon J J 

>min( max ( (p., Py |x) - lim -E«(p,)j | (140) 

> mini max |Eo(pi,Py|x) + Po(po - Pi) - lim -E»(Po)U (141) 
= min(T,(Po,Po) - lim -E^\p^)\ (142) 

4=1,2 I 71— >c« 72, J 

> min{ri(Po, Po)} - t^s(po, Py), (143) 

where (138) follows by noting that h{k) is subexponential in k; in (139) we used that, if the 
limit lim„^oo fn{x) exists for every x, then liminf^^oo max^.{/„(x)} > max^; {lim^^oo fn{,x)}\ 
(140) follows from restricting the intervals over which pi and p2 are maximized; in (141) we 
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applied (136) for i = 1,2; (142) follows from the definition of Tj(_R, p), i = 1,2, cf. (106); and 
in (143) we used that, by definition, Ei'\p) < E^{p,Pv) = kE^{p, Py) since the sum in (24) 
contains less summands than the sum in (6). 

As (137)-(143) hold for arbitrary po ^ [0, 1] and Rq E [0,R\;], we obtain upon maximizing 
over Rq and po ^ [p, 1] (with p defined in Lemma 6) 

max Eb(Rq) > max < max mm \ TA Rq, po)\ — tEsipn, Py) > (144) 
%e[o,/?v] — Poe[/3,i] tiJoe[o,i?,v]i=i,2'- J 

= max < min max{Ti{Ro, po)] - tE^{po, Py) > , (145) 
Poe[/3,i] t^oe[o,iJv] *=i,2 ^ J 

where in (145) we applied Lemma 6. 

Note that, since T^i U 7^2 = [0, 1], 

maxri(i?o,Po) = max max {Eo{pi, Pyix) + Ro{po - Pi)} (146) 

1=1,2 i=l,2 pie-Ri(p) 



max {Eo{p, Py\x) + Ro{po - p)] (147) 

P6[0,l] 



from which we obtain 



max -Eb(-Ro) > max <^ min max {Eo(p, Pyix) + -Ro(Po - p)} - ^-Es(Po, -Py) ^ (148) 
,e[o,flv] — poe[/3,i] |^i?,oe[o,i?v]pe[o,i] ' ' '] 

> max <^ min max {Eq{p, Py\x) + i?o(Po - p)} - i^s(Po, ^y) ^ (149) 
poe[/3,i] [iJo>Ope[o,i] J 

= max {Eo(po,Py|x)-t^s(po,Py)}, (150) 

P0G[p,l] 

where (149) follows by relaxing the range over which Rq is optimized; and (150) follows from 
Lemma 2 in Appendix I. 

To conclude the proof it remains to show that the range of po over which the argument of (150) 
is optimized can be extended to po G [0, 1] without violating the inequality chain (148)-(150). 
We prove this by contradiction. To this end, assume that there exists a Po < p that satisfies 

p* = arg max {^o(Po, Py\x) - tE,{po, Py)}. (151) 
poe[o,i] 

Since Eo(p, Py\x) is a continuous function of p, the set of maximizers of Eo(p, Py\x) — pRv in 
p G [0, 1], denoted as S*, is compact. The set of maximizers of Eoip, Py\x) — pRv is the convex 
hull of S*, denoted as S* . From the definition of p we have that p = mm{S*} = mm{S*}. 
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Then, it follows that 

= Eo{pl Py\x) - pIRv + pIRv - tE,{pl Pv) (152) 

< Eo(p, Py\x) - pRv + pIRv - tE^iPo, Pv) (153) 

< Eoip, Py\x) + Rv{pl -p)-t [e,{^ Pv) + ^^^^^(pS - p)) (154) 

<E^{p,PY\x)-tE,{p,Pv) (155) 

where (153) follows from the definition of p; (154) follows from the convexity of Es{p, Py)', 
and (155) follows because 

= ^. (156) 



dE,{p,Pv) ^ dE,{p,Pv) 



dp ~ p'->oo dp 

From (152)-(155) it follows that by choosing po = p we would achieve an objective strictly 
larger than by choosing po = Pg, hence contradicting the initial assumption. 
It thus follows that 

max {Eo{po,Pyix) -tE,{po,Pv)} = max {Eo{po, Py\x) - tE,{po, Py)} ■ (157) 
poe[/3,i] Po6[o,i] 

Since (157) is equal to (17), this concludes the proof. 

Appendix V 
Proof of Theorem 5 

We fix 5 > and associate a type-covering set 71 with each source-type class % for k > 
fcp ''((5, d, A) G N, i = 1, . . . , Nk. Let k > maxj=i_ .. at^. /cq*^ satisfy the condition of Lemma 1 for 
every i = 1, . . . , N^. We consider the transmission scheme described in Section IV and apply the 
random-coding arguments of Section III to upper-bound its excess-distortion error probability. To 
this end, we first index the source-type classes 7i, 72, • • • , Tn^. in increasing order of cardinality 
of the corresponding type-covering sets Ti, 72, • • • , Tn^:,, such that ITl] < |72| < ••• < {Tn^I- We 
then bound Pr{Z 7^ Z} using Theorem 1 together with a partition constructed as follows: 

1) set Ai = 0,i = l,...,Nk; 

2) if z belongs to a unique 71, then assign z to Ai, i.e., Ai = AiU {z};^ 



^By the construction of /a(-)' every z G /a(V'°) belongs to at least one 71. 
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3) if z belongs to Tj for a subset of indices Ji^z) C {1, . . . , , iV/j}, then assign z to the set 
Ai with the smallest index, i.e., Aj* = Aj* U {z}, where 

J* = mm J{z)- (158) 

4) repeat Steps 2 and 3 until all z E /a(V'^) are assigned to a class; 

By the above construction, we obtain a non-trivial partition Vk = {Ai}iei, Ai C %, for i E Z, 
where X C {1, . . . , N^} denotes the subset of integers that index non-empty classes in 
Then, we apply Theorem 1 with Vk = {Ai}iex and Pz as the source distribution over /a(V'^). 
Hence, for every set of product distributions P^ix) = 11^=1 ^xi^j)' ^ ^ ^T, and for every set 
of parameters pi E [0,1], i E X, the average probability of error is upper-bounded by 

eA<eB(n) = M^)Vexpf- max \ Eo{p^, Py\x, P^i^) ~ E^'^p,, Pz)]] . (159) 
For the given 5 > 0, we define 

^ \og\Ai\-2kS . 

itj = , iEZ (160) 

n 

and upper-bound 

<plog|A|+log( ^Pz(;2;)) (162) 

= np(^i?, + ^^ + log 1^1^ Pz(2)j, 2GX, (163) 

where (162) follows from Jensen's inequality [19] applied to the function /(x) = x", a E (0, 1), 
and (163) follows from (160). We note that the second summand in (163) can be written as 

log('5^Pz(2)'j =log['5^ J2 Pvi^)) (164) 

\zeAi / \j=i veTj-. J 



^ J2 Pviv)], I El, (165) 
j=i vaTj-. J 



March 26, 2013 



DRAFT 



35 



where (165) follows from the class assignment (158). This is further upper-bounded by 

\zeAi J \j=i veTj / 

<log|^^exp(-fcD(P,||Pv^))j (167) 

< log + 1)1^1 exp l^-A; min^ D{Pj\\Pv)^^ (168) 
= log ( (A; + l)l^lexp ( -/c min D{PA\Pv)]] (169) 

V V m\>m\ ) ) 

= k min D (P JPy) + |V| log(A; + 1) (170) 

for all 2 G X, where (166) results from relaxing the summation in (165); the inequality (167) 
follows from [7, Lemma 2.6]; (168) follows from the type counting lemma [7, Lemma 2.2]; and 
(169) follows because the type-covering sets are ordered, so the set of indices j = i, . . . , Nk is 
equal to the set {j: \Tj\ > |7i|}. We further upper-bound (170) as 

logl Y Pziz)] < -k min D{PA\Pv) + \V\log{k + l) (171) 
<-k inf D(PQ||Py) + |V|log(A; + l) (172) 



?:R(Pq,A)>^ 



= -kF A, Pv^ + |V| log(A; + 1), (173) 
for all i E I, where (171) follows by noting that, by (56) and (160), 

It follows from (163) and (173) that the term eP{p, Pz) is upper-bounded by 

Ef\p, Pz) <n(^p (^R, + ^) + 6,n) - kF A, P^^ , le X, (175) 

where ^k,n — By combining (175) with (159), and maximizing over product distribu- 

tions (25), we obtain 

^Bi'Pk) < h{k) g-maxp.g^o i)|n£o(p^,Py|jf)-pi(nfli+2fc(5)}+fcF(^,A,Pv)-ngfc^„ (176) 

= h{k) e-"(^-(^'+'^'^^i^)+^^('^''^'^^)-«^-) (177) 

< iV^/i(A;)e-"''^^«>«{^'(^+'^'^^i^)+^^('^'^'^^)-«*-}, (178) 
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where in (177) we have used the definition of the random-coding channel exponent (9), and in 

(178) we have defined A^^ = |X| and relaxed the set over which the minimization is 

performed to all possible values of i? > 0. 

Using that Ej- (_R, Py\x) is a convex function in R [7] (and thus, a locally Lipschitz function 

[22]) we obtain for the sequence of partitions under considerations and M > that 

1 f kM5 k f nR \ 1 

liminf logeB(Pfc) > lim inf <^ -^.n + [R, Pyix) + -E (—,A,Pv)\ 

(179) 

r kM5 k f R \ 1 

> lim inf<^-a,n + E,{R,Py\x) + -F [-,A,Pv] k (180) 

n-s>oo i?>0 [ 71 ' 71 \t J ) 

where (180) follows from ^ < t for all {k,7i) and the monotonicity of E{R, A, Py) with 

respect to R. Without loss of generality, we restrict the interval over which the infimum in 

(180) computed to a subset R E T C (0, +c)o) in which the function E (^j,A,Pv) is finitely 

upper-bounded. Then, we can apply the uniform convergence of the inner function in (180) for 

every R eT to obtain 

1 r kMS k f R \ 1 

liminf log e^{Vk) > inf <^ lim -^.n + {R, Py\x) + -F -, A, U (181) 

rn-oo 71 -Rer [n-s>oo 71 ^ 71 \ t J ) 

= mf {R, Py\x) + tE (^j, A, Pv^ | - tM6, (182) 

> mf {R, Py\x) + tE (^j, A, Pv^ I - tMS, (183) 

where (182) is a consequence of lim„_>oo ^ = ^ and lim„_5.oo ^k,n = 0. Theorem 5 follows from 
(182) by letting S tend to zero from above. 
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